Skip to content

[PowerPC] Check ResNo at end of BitPermutationSelector::Select32 #151429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kernigh
Copy link
Contributor

@kernigh kernigh commented Jul 31, 2025

If it optimizes away a permutation (rotate all 32 bits left by 0), the result might be from a SDNode with more than one result, such as a load node. The node replacement assumes result number 0. If it isn't 0, kludge by adding an extra node.

Fixes #133507


This isn't the best fix, because I added an extra register move. I wrote in the new test,

; The stw should store POINTER, not VALUE.
; CHECK:        lwzu [[VALUE:[0-9]+]], 28([[POINTER:[0-9]+]])
; CHECK:        mr [[MOVED:[0-9]+]], [[POINTER]]
; CHECK:        stw [[MOVED]], 0({{[0-9]+}})

It should lose the mr and just do stw [[POINTER]], 0({{[0-9]+}}). It would help me if someone can write a better fix.

@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2025

@llvm/pr-subscribers-backend-powerpc

Author: George Koehler (kernigh)

Changes

If it optimizes away a permutation (rotate all 32 bits left by 0), the result might be from a SDNode with more than one result, such as a load <pre-inc> node. The node replacement assumes result number 0. If it isn't 0, kludge by adding an extra node.

Fixes #133507


This isn't the best fix, because I added an extra register move. I wrote in the new test,

; The stw should store POINTER, not VALUE.
; CHECK:        lwzu [[VALUE:[0-9]+]], 28([[POINTER:[0-9]+]])
; CHECK:        mr [[MOVED:[0-9]+]], [[POINTER]]
; CHECK:        stw [[MOVED]], 0({{[0-9]+}})

It should lose the mr and just do stw [[POINTER]], 0({{[0-9]+}}). It would help me if someone can write a better fix.


Full diff: https://github.com/llvm/llvm-project/pull/151429.diff

2 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp (+5)
  • (added) llvm/test/CodeGen/PowerPC/lwzu-i48.ll (+19)
diff --git a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
index 415164fc9e2cb..710662dc107c5 100644
--- a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
@@ -2337,6 +2337,11 @@ class BitPermutationSelector {
                         ANDIVal, ANDISVal), 0);
     }
 
+    // Caller assumes ResNo == 0, but we might have ResNo != 0 after
+    // optimizing away a permutation.  Kludge with an extra node.
+    if (Res.getResNo() != 0)
+      return CurDAG->getMachineNode(PPC::OR, dl, MVT::i32, Res, Res);
+
     return Res.getNode();
   }
 
diff --git a/llvm/test/CodeGen/PowerPC/lwzu-i48.ll b/llvm/test/CodeGen/PowerPC/lwzu-i48.ll
new file mode 100644
index 0000000000000..8205f5976e876
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/lwzu-i48.ll
@@ -0,0 +1,19 @@
+; RUN: llc -mtriple=powerpc-unknown-openbsd < %s | FileCheck %s
+
+; BitPermutationSelector in PPCISelDAGToDAG.cpp was taking the wrong
+; result of a load <pre-inc> after optimizing away a permutation.
+; Here, the big end of i48 %3 was %1 but should be %0.
+
+define i32 @hop(ptr %out, ptr %in) {
+entry:
+  %0 = getelementptr i8, ptr %in, i32 28
+  %1 = load i32, ptr %0, align 4
+  %2 = ptrtoint ptr %0 to i48
+  %3 = shl i48 %2, 16
+  store i48 %3, ptr %out, align 4
+  ret i32 %1
+}
+; The stw should store POINTER, not VALUE.
+; CHECK:        lwzu [[VALUE:[0-9]+]], 28([[POINTER:[0-9]+]])
+; CHECK:        mr [[MOVED:[0-9]+]], [[POINTER]]
+; CHECK:        stw [[MOVED]], 0({{[0-9]+}})

bob-beck pushed a commit to openbsd/src that referenced this pull request Aug 2, 2025
clang 19 was miscompiling its own llvm::MergeBasicBlockIntoOnlyPred
(Transform/Utils/Local.cpp).  A change between clang 18 and 19 exposed
a mistake in BitPermutationSelector (PPCISelDAGToDAG.cpp).  It
optimized away a permutation (rotate all 32 bits left by 0) and got
result number 1 from a load pre-inc SDNode, but the node replacement
assumed result number 0.  Kludge by adding an extra node.

llvm/llvm-project#133507
llvm/llvm-project#151429

This isn't the best fix.  I will replace it if LLVM commits a
different fix.

ok miod@
@brad0
Copy link
Contributor

brad0 commented Aug 2, 2025

cc @lei137 @amy-kwan @diggerlin

@kernigh
Copy link
Contributor Author

kernigh commented Aug 7, 2025

My fix is not complete. My patch fixes optimizing away a 32-bit permutation, but not a 64-bit permutation. See llc -mtriple=powerpc64-unknown-openbsd -debug-only=isel,ppc-isel with this input,

define i32 @hop(ptr %out, ptr %in) {
entry:
  %0 = getelementptr i8, ptr %in, i32 28
  %1 = load i32, ptr %0, align 4
  %2 = ptrtoint ptr %0 to i96
  %3 = shl i96 %2, 32
  store i96 %3, ptr %out, align 8
  ret i32 %1
}

This emits wrong code,

        mr      5, 3
        lwzu 3, 28(4)
        sldi 4, 4, 32
        std 3, 0(5)
        stw 4, 8(5)
        blr

The std 3, 0(5) stores the wrong result of lwzu 3, 28(4). It is storing %1 (from the IR) but should store %0. If I pass -debug-only=isel,ppc-isel, the optimized legalized selection DAG looks correct,

Optimized legalized selection DAG: %bb.0 'hop:entry'
SelectionDAG has 22 nodes:
  t0: ch,glue = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
  t24: i64 = shl t43:1, Constant:i32<32>
        t33: i64 = add nuw t2, Constant:i64<8>
      t34: ch = store<(store (s32) into %ir.out + 8, align 8), trunc to i32> t43:2, t24, t33, undef:i64
            t26: i64 = srl t43:1, Constant:i32<32>
          t39: i64 = PPCISD::SHL t26, Constant:i32<32>
          t40: i64 = PPCISD::SRL t24, Constant:i32<32>
        t41: i64 = or t39, t40
      t31: ch = store<(store (s64) into %ir.out)> t43:2, t41, t2, undef:i64
    t35: ch = TokenFactor t34, t31
  t17: ch,glue = CopyToReg t35, Register:i64 $x3, t43
    t4: i64,ch = CopyFromReg t0, Register:i64 %1
  t43: i64,i64,ch = load<(load (s32) from %ir.0), anyext from i32, <pre-inc>> t0, t4, TargetConstant:i64<28>
  t18: ch = PPCISD::RET_GLUE t17, Register:i64 $x3, t17:1

Notice that t41 = (t43:1 >> 32 << 32) | (t43:1 << 32 >> 32) is equal to t43:1. BitPermutationSelector finds that t41 is t43:1 rotated left by 0,

ISEL: Starting selection on root node: t41: i64 = or t39, t40
Considering bit-permutation-based instruction selection for:    t41: i64 = or t39, t40
	bit group for 0xb4929e8da10 RLAmt = 0 [0, 63]
		rotation groups for 0xb4929e8da10 RL: 0:
			isel using masking: 2 using rotates: 1

BitPermutationSelector should change t41 to t43:1, but it wrongly changes t41 to t43.

If BPS optimizes away a permutation (rotate all bits left by 0), the
result might be from an SDNode with more than one result, such as a
load pre-inc node.  The node replacement assumed result number 0.
Change it to use an SDValue with the correct result number.

Fixes llvm#133507
@kernigh
Copy link
Contributor Author

kernigh commented Aug 7, 2025

I forced-push a commit which fixes 64-bit permutation in my last comment, and also loses the extra register move in my 1st comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clang 19 or 20 miscompiles llvm::MergeBasicBlockIntoOnlyPred for PPC32
3 participants